Hierarchical Rule Generalization with Automatically Derived Multi-level Word Classes for MT

ثبت نشده
چکیده

Hierarchical rule generalization is a central aspect of Chiang’s hierarchical decoding model. “Hierarchical substitution” is the process by which generalized rules are used in decoding. We believe that hierarchical substitutions of semantically similar words are more likely to be grammatically well-formed than hierarchical substitutions of arbitrary words. In this paper, we extend the rule generalization process to use a hierarchy of semantic bilingual word classes, which are derived automatically using a bilingual semantic similarity score. The result is that during decoding, hierarchical substitution of semantically specific classes (e.g., the class of colors) will have a higher probability than hierarchical substitution of more general classes (e.g., the class containing all words). Although we achieved a substantial increase in the number of generalized rules used during decoding, more investigation is required to determine how this will improve MT scores (BLEU) over those generated by the standard rule generalization process.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Derived Multi-level Word Classes for MT

Grouping words into semantic classes is a well studied task in NLP and MT research. The goal of this project was to automatically derive a multi-level hierarchy of bilingual word pair classes, for use in generalized rule extraction. This means that we have a set of k bilingual word pairs in our source and target languages 〈s1, t1〉, 〈s2, t2〉, ...〈sk, tk〉, where each sj → tj has been extracted as...

متن کامل

A Study on the Commentary of Historical Verses with an Emphasis on the Rule of Al-Ibrah

One of the prevalent commentary rules about commentary of the historical verses which has a certain revelation occasion and refers to a specific time and place is the rule of alibrah being stated as: take in consideration universality of the word not particularity of the occasion. The source of this rule refers to the verses which have universal word and particular occasion. The referent of the...

متن کامل

Deriving de/het gender classification for Dutch nouns for rule-based MT generation tasks

Linguistic resources available in the public domain, such as lemmatisers, part-ofspeech taggers and parsers can be used for the development of MT systems: as separate processing modules or as annotation tools for the training corpus. For SMT this annotation is used for training factored models, and for the rule-based systems linguistically annotated corpus is the basis for creating analysis, ge...

متن کامل

Learning Multiple-Nonterminal Synchronous Grammars for Statistical Machine Translation

Recent work in machine translation has evolved from the traditional word and phrase based models to include hierarchical phrase based and syntax-based models. These advances are motivated by the desire to integrate richer knowledge within the translation process to explicitly address limitations of the purely lexical phrasebased model. Generalized phrases as discussed in (Chiang, 2005) attempt ...

متن کامل

Hybrid System Combination for Machine Translation: An Integration of Phrase-level and Sentence-level Combination Approaches

Hybrid System Combination for Machine Translation: An Integration of Phrase-level and Sentence-level Combination Approaches Wei-Yun Ma Given the wide range of successful statistical MT approaches that have emerged recently, it would be beneficial to take advantage of their individual strengths and avoid their individual weaknesses. Multi-Engine Machine Translation (MEMT) attempts to do so by ei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008